download.png

Context

This dataset contains abstracts of the accidents and injuries of construction workers from 2015-2017. There is some structured data around the unstructured text abstracts, such as Degree of Injury, Body Part(s) Affected, and Construction End Use.

What trends do we see in injuries in terms of time of day?

What is the reason injuries are occurring?

Which factors have the greatest impact on a construction accident?

How accurately do machine learning models predict accidents and injuries in the construction industry?

Import Libraries For Overview & EDA

Get The Dataset

Overview Of The Dataset

Data Cleaning

Columns

summary_nr

Event Date

Abstract Text

Event Description

Event Keywords

Construction End Use

Building Stories

Project Cost

Project Type

Degree of Injury

Nature of Injury

Part of Body

Event type

Environmental Factor

Human Factor

Task Assigned

hazsub

fat_cause

fall_ht

Drop Non-Important Columns

EDA: Exploratory Data Analysis

Auto EDA with Sweetviz

Analyzing the Dataset with Sweetviz

Comparing the Outcomes of Target Feature (Fatal & NonFatal)

Association Rule Mining: Apriori

Build Dataset For Predicting

Hash Encoding

Train Valid Test Split

Train Dataset:

Set of data used for learning (by the model), that is, to fit the parameters to the machine learning model

Valid Dataset:

Set of data used to provide an unbiased evaluation of a model fitted on the training dataset while tuning model hyperparameters. Also play a role in other forms of model preparation, such as feature selection, threshold cut-off selection.

Test Dataset:

Set of data used to provide an unbiased evaluation of a final model fitted on the training dataset.

Predictions

Logistic Regression

K Nearest Neighbors

Decision Tree

Hyper Parameter For Decision Tree

Random Forest

Hyper Parameter For Random Forest

Support Vector Machine

Hyper Parameter For Support Vector Machine Classification

Gaussian Naive Bayes

Bernoulli Naive Bayes

Hyper Paramets For Bernoulli Naive Bayes

Bagging Classifier Algorithm

Gradient Boosting Classifier Algorithm

XGboost Classifier

Voting Classifier

Test Models

0. Logistic Regression

1. Random Forest Classifier

2. XGboost Classifier

3. Gradient Boosting Classifier

4. Bagging Classifier

5. Support Vector Machine

6. Decision Tree Classifier

7. Gaussian Naive Bayes

8. Bernoulli Naive Bayes

9. Voting Classifier

Cross Validation: Evaluating Estimator Performance

Comparing Machine Learning Algorithms

END =)